On Differentially Private Longest Increasing Subsequence Computation in Data Stream

نویسندگان

  • Luca Bonomi
  • Li Xiong
چکیده

Many important applications require a continuous computation of statistics over data streams. Activities monitoring, surveillance and fraud detections are some settings where it is crucial for the monitoring applications to protect user’s sensitive information in addition to efficiently compute the required statistics. In the last two decades, a broad range of techniques for time-series and stream data monitoring has been developed to provide provable privacy guarantees employing the formal notion of differential privacy. Although these solutions are well established, they are mostly limited to count based statistics (e.g. number of distinct elements, heavy hitters) and do not apply in settings where more complex statistics are needed. In this paper, we consider a more general problem of estimating the sortedness of a data stream by privately computing the length of the longest increasing subsequence (LIS). This important statistic can be used to detect surprising trends in time-series data (e.g. finance) and perform approximate string matching in computational biology domains. Our proposed approaches employ the differential privacy notion which provides strong and provable privacy guarantees. Our solutions estimate the length of the LIS using block decomposition and local approximation techniques. We provide a rigorous analysis to bound the approximation error of our algorithms in terms of privacy level and length of the stream. Furthermore, we extend our solutions to computing the length of the LIS over sliding windows and we show the beneficial effects of this formulation on the final utility. An extensive experimental evaluation of our proposed solutions on real-world data streams demonstrates the effectiveness of our approaches for computing accurate statistics and detecting surprising trends.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Private Computation of the Longest Increasing Subsequence in Data Streams

In this paper, we study the problem of privately computing ordered statistics with the goal of monitoring sequential data streams. Despite the broad series of techniques for time-series monitoring, only few works provide provable privacy guarantees employing the formal notion of differential privacy. While these solutions are well established, their focus is mostly limited to count based statis...

متن کامل

Tight Lower Bounds for Multi-pass Stream Computation Via Pass Elimination

There is a natural relationship between lower bounds in the multi-pass stream model and lower bounds in multi-round communication. However, this connection is less understood than the connection between single-pass streams and one-way communication. In this paper, we consider data-stream problems for which reductions from natural multi-round communication problems do not yield tight bounds or d...

متن کامل

Finding Longest Increasing and Common Subsequences in Streaming Data

In this paper, we present algorithms and lower bounds for the Longest Increasing Subsequence (LIS) and Longest Common Subsequence (LCS) problems in the data streaming model. For the problem of deciding whether the LIS of a given stream of integers drawn from {1, . . . ,m} has length at least k, we discuss a one-pass streaming algorithm using O(k log m) space, with update time either O(log k) or...

متن کامل

Cell-probe bounds for online edit distance and other pattern matching problems

We give cell-probe bounds for the computation of edit distance, Hamming distance, convolution and longest common subsequence in a stream. In this model, a fixed string of n symbols is given and one δ-bit symbol arrives at a time in a stream. After each symbol arrives, the distance between the fixed string and a suffix of most recent symbols of the stream is reported. The cell-probe model is per...

متن کامل

A note on randomized streaming space bounds for the longest increasing subsequence problem

The deterministic space complexity of approximating the length of the longest increasing subsequence of a stream of N integers is known to be Θ̃( √ N). However, the randomized complexity is wide open. We show that the technique used in earlier work to establish the Ω( √ N) deterministic lower bound fails strongly under randomization: specifically, we show that the communication problems on which...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Transactions on Data Privacy

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2016